-
Notifications
You must be signed in to change notification settings - Fork 519
[macOS][unified_log] Initial release of macOS Unified logs integration. #15794
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good start, but I don't think it's complete enough or shows the right metrics for a security analyst -
The total request/response/network bytes aren't that useful without time context or comparative baselines.
Same feedback for "Events by response status" and "Events by privacy status" - these are basically saying "everything worked" but don't give actionable info.
Also, re the top source IP datatable - it doesn't show which processes are communicating, destination IPs, protocols/ports or time of activity which are more relevant...
Can we try editing some of this @muskan-agarwal26 @piyush-elastic - can you let me know if any of these visualizations are possible based on the information available in the logs?
- "Network connections over time" (maybe an area chart?) showing connections/minute, color-coded by new/established/closed
- "Active network connections" table showing process, local port, remote IP, state, duration
- "Top external destinations" table showing domains/IPs and # of connections
cc @jamiehynds
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, @cpascale43 , I’ll follow your suggestions and make the necessary changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar feedback to the Network dashboard - can we make sure this has more security context?
The biggest enhancements would be an events timeline, and a more security-specific events breakdown table.
The pie charts up top aren't very meaningful on their own, so I think we could replace them with an area chart showing aggregated security events over time. Is there a way a user could see key events represented on the top and when they occurred, like
- "3 failed authentication attempts at 12:23"
- "New process launched from /tmp at 9:15"
Also, the "Events by Subsystem" and "Events by Category" bar charts are a bit strange - I think it would be more useful to bucket the events into categories like "Authentication", "Process", "Network", "File system" etc, in line with the categories outlined in the issue.
Can we make a "Security Event Types" table showing a breakdown of the number of events per category? Something like
Authentication Events
---------
- Successful logins | 45
- Failed logins | 3
- Privilege escalation | 12
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cpascale43,
As suggested, we will remove the following visuals:
All pie charts and bar charts
And add the following visual:
Aggregated Security Events Over Time – This will display the breakdown of each category over time. It will be created using a custom field derived within the pipeline by bifurcating events based on predicates.
We couldn’t create the Security Event Types visual as suggested, since we’re unable to break down subcategories within categories. For example, under the authentication category, subcategories like “successful logins” or “failed logins” aren’t available, as such details are not present in the logs.
.github/CODEOWNERS
Outdated
| /packages/lumos @elastic/security-service-integrations | ||
| /packages/lyve_cloud @elastic/security-service-integrations | ||
| /packages/m365_defender @elastic/security-service-integrations | ||
| /packages/macos @elastic/security-service-integrations |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should be owned by @nfritts's team, the input is owned by their team and also system unified logs. Any reason we are linked here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
packages/macos/data_stream/unified_log/elasticsearch/ingest_pipeline/pipeline-network.yml
Outdated
Show resolved
Hide resolved
packages/macos/data_stream/unified_log/elasticsearch/ingest_pipeline/pipeline-network.yml
Outdated
Show resolved
Hide resolved
packages/macos/data_stream/unified_log/elasticsearch/ingest_pipeline/pipeline-network.yml
Outdated
Show resolved
Hide resolved
packages/macos/_dev/build/build.yml
Outdated
| @@ -0,0 +1,3 @@ | |||
| dependencies: | |||
| ecs: | |||
| reference: git@v8.17.0 | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we probably want to use 9.2.0 since is the latest
| @@ -0,0 +1,80 @@ | |||
| predicate: | |||
| {{#if authentication}} | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we want this or multiple data streams instead? are there any advantages to this approach vs the other?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the logs are coming from a single source and only differ by filters such as authentication, user, or network, a single data stream is sufficient. We usually create separate streams only when data originates from different endpoints.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We also have many cases were datastreams are more of a logical separation, eg mimecast, google_workspace, postgresql, etc., in this case I think it could have some benefits by simplifying the logic in the pipelines by quite a lot, since we will always know the event types, and we can reduce logic to identify them. Would make for cleaner pipelines, easier to maintain in the future, and less prone to breaking if anything changes. Is this something we could consider?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a feedback, even if the data comes from a single source, if the context/dataset is different like authentication, user and network, they should be stored in different data streams because they may have different requirements for retention, a different volumetry and also require different custom processing.
Having all the data on a single data stream already cause some issues for other integrations like Fortigate and Palo Alto, since this integration can generate an insane amount of logs, this would cause the same problems where the user may want to store some events for a longer time, but is unable because it is all in the same data stream.
For example, in our cause we would have a different retention for authentication and user events and another for network, this would not be possible with a single data stream.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed @leandrojmp , we are working on separating the events into different data streams, soon will update the PR.
packages/macos/manifest.yml
Outdated
| title: Collect unified logs from macOS | ||
| description: Collecting unified logs from macOS. | ||
| owner: | ||
| github: elastic/security-service-integrations |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here, the owner should be the @elastic/sec-windows-platform team, @marc-gr is @elastic/sec-linux-platform taking over this one?
.github/CODEOWNERS
Outdated
| /packages/lumos @elastic/security-service-integrations | ||
| /packages/lyve_cloud @elastic/security-service-integrations | ||
| /packages/m365_defender @elastic/security-service-integrations | ||
| /packages/macos @elastic/sec-linux-platform |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should be @elastic/sec-windows-platform
marc-gr
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
left some initial comments
| @@ -0,0 +1,312 @@ | |||
| --- | |||
| description: Pipeline for processing authentication logs. | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| description: Pipeline for processing authentication logs. | |
| description: Pipeline for processing common fields |
| "kind": "event", | ||
| "original": "{\"timezoneName\":\"\",\"messageType\":\"Error\",\"eventType\":\"logEvent\",\"source\":null,\"formatString\":\"rejecting write of key(s) %{public}s in { %{public}s, %{public}s, %{public}s, %{public}s, managed: %d } from process %{public}d (%{public}s) because %{public}s\",\"userID\":502,\"activityIdentifier\":0,\"subsystem\":\"com.apple.defaults\",\"category\":\"cfprefsd\",\"threadID\":273730,\"senderImageUUID\":\"FEDAF68C-F484-3FCA-8866-A9E7E46CE7B6\",\"backtrace\":{\"frames\":[{\"imageOffset\":1818634,\"imageUUID\":\"FEDAF68C-F484-3FCA-8866-A9E7E46CE7B6\"}]},\"bootUUID\":\"218031E6-E47F-4A77-B7FC-5A57B049F4BC\",\"processImagePath\":\"\\/usr\\/sbin\\/cfprefsd\",\"senderImagePath\":\"\\/System\\/Library\\/Frameworks\\/CoreFoundation.framework\\/Versions\\/A\\/CoreFoundation\",\"timestamp\":\"2025-10-01 18:19:11.945508+0530\",\"machTimestamp\":454777460134003,\"eventMessage\":\"rejecting write of key(s) CKStartupTime in { secd, test, kCFPreferencesAnyHost, \\/Users\\/test\\/Library\\/Preferences\\/secd.plist, managed: 0 } from process 4954 (secd) because setting these preferences requires user-preference-write or file-write-data sandbox access\",\"processImageUUID\":\"04C516B8-C8E5-30EF-AC49-1631528F5645\",\"traceID\":35866893965594628,\"processID\":4944,\"senderProgramCounter\":1818634,\"parentActivityIdentifier\":0}", | ||
| "type": [ | ||
| "info" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldnt this match the level? in this case error
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removing event.type from advanced_monitoring pipeline, as we haven't mapped event.category.
So matching isn't needed here.
| predicate: | ||
| - 'eventMessage CONTAINS[c] "exec" OR eventMessage CONTAINS[c] "fork" OR eventMessage CONTAINS[c] "exited" OR eventMessage CONTAINS[c] "terminated"' | ||
| - 'subsystem == "com.apple.securityd" AND (composedMessage CONTAINS "code signing" OR composedMessage CONTAINS "not valid")' | ||
| - 'composedMessage CONTAINS "com.apple.quarantine"' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could beinteresting to have this as the default value, and let the user override them instead of making them hardcoded here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have added option for user to add their predicates as well.
Do you mean to remove the hardcoded ones?
| "id": "248" | ||
| }, | ||
| "log": { | ||
| "level": "Default" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would make sense to standardize this value
| "user": [ | ||
| "501", | ||
| "248", | ||
| "\"Setup User\"", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we might want to sanitize this and remove the quotes
P1llus
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Focusing mostly on the data and ingest pipelines adding my two cents, feel free to ignore them if needed.
| @@ -0,0 +1,50 @@ | |||
| predicate: | |||
| - 'process contains "sudo" OR composedMessage CONTAINS "sudo" OR process contains "su"' | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has the same predicate as the authentication datastream, how does that work? Does it ingest the data twice?
I can see why splitting it up into different datastreams was discussed earlier and I agree as it would be too big otherwise, but if they have the same predicate then maybe it wouldn't be as useful?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removing it from user_and_account_management, it should be a part of authentication, thanks.
| - set: | ||
| field: process.pid | ||
| tag: set_process_pid_from_unified_log_process_id | ||
| copy_from: macos.process.id |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason we need to store this sort of information twice?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First it is stored in custom mapping, secondly in ecs field., Same for below two fields as well.
| - set: | ||
| field: process.thread.id | ||
| tag: set_process_thread_id_from_unified_log_thread_id | ||
| copy_from: macos.thread_id |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here
| - set: | ||
| field: '@timestamp' | ||
| tag: set_@timestamp_from_unified_log_timestamp | ||
| copy_from: macos.timestamp |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And a third one?
| type: string | ||
| ignore_missing: true | ||
| - append: | ||
| field: user.id |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not supposed to be an array, its a single keyword representing the most related user of the event, I see related.user is also there, which is where it should end up if its more than 1.
| external: ecs | ||
| - name: event.module | ||
| type: constant_keyword | ||
| external: ecs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you are setting the type and the value yourself then its most likely not external? Unsure if that might overwrite it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main reason for using external is to retrieve definitions from the ECS. Specifying a type and value will override those fields, but the description will still be sourced from the external reference.
| type: long | ||
| - name: home_directory_path | ||
| type: keyword | ||
| - name: hostname |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really want to double map all these ECS fields to both have the custom data and the ECS fields?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is common practice to have both mapped in all integrations, hence followed the same here as well.
| pattern_definitions: | ||
| GREEDYMULTILINE: '(.|\n)*' | ||
| patterns: | ||
| - '^\[%{WORD}\] %{DATA}\:(?:%{SPACE}mach=%{WORD:macos.event.message.mach:boolean})?(?:%{SPACE}listener=%{WORD:macos.event.message.listener:boolean})?(?:%{SPACE}peer=%{WORD:macos.event.message.peer:boolean})?(?:%{SPACE}name=%{GREEDYDATA:macos.event.message.name})?' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this really the only way we could parse this? I see that its quite unstructured but it seems at least some of this could maybe be put into a custom pattern and reused for many of these?
| - '^%{WORD} \[%{DATA}\](?:%{SPACE}flags=\[%{DATA:macos.event.message.flags}\])?(?:%{SPACE}seq=%{DATA:macos.event.message.seq},)?(?:%{SPACE}ack=%{DATA:macos.event.message.ack},)?(?:%{SPACE}win=%{DATA:macos.event.message.win})?(?:%{SPACE}state=%{DATA:macos.event.message.state})?(?:%{SPACE}rcv_nxt=%{DATA:macos.event.message.rcv_nxt},)?(?:snd_una=%{DATA:macos.event.message.snd_una})' | ||
| - '^%{WORD} \[%{DATA}\](?:%{SPACE}flags=\[%{DATA:macos.event.message.flags}\])?(?:%{SPACE}seq=%{DATA:macos.event.message.seq},)?(?:%{SPACE}ack=%{DATA:macos.event.message.ack},)?(?:%{SPACE}win=%{DATA:macos.event.message.win})?(?:%{SPACE}state=%{DATA:macos.event.message.state})?(?:%{SPACE}rcv_nxt=%{DATA:macos.event.message.rcv_nxt},)?(?:snd_una=%{DATA:macos.event.message.snd_una})' | ||
| - '^nw_protocol_boringssl_signal_connected\(%{NUMBER}\) \[%{DATA:macos.event.message.connection_identifier}\]\[%{DATA}\] TLS connected \[(?:version\(%{DATA:macos.event.message.tls_version}\))?(?:%{SPACE}ciphersuite\(%{DATA:macos.event.message.cipher_suite}\))?(?:%{SPACE}group\(%{DATA:macos.event.message.group}\))?(?:%{SPACE}signature_alg\(%{DATA:macos.event.message.signature_alg}\))?(?:%{SPACE}alpn\(%{DATA:macos.event.message.alpn}\))?(?:%{SPACE}resumed\(%{DATA:macos.event.message.resumed}\))?(?:%{SPACE}offered_ticket\(%{DATA:macos.event.message.offered_ticket}\))?(?:%{SPACE}false_started\(%{DATA:macos.event.message.false_started}\))?(?:%{SPACE}ocsp_received\(%{DATA:macos.event.message.ocsp_received}\))?(?:%{SPACE}sct_received\(%{DATA:macos.event.message.sct_received}\))?(?:%{SPACE}connect_time\(%{DATA:macos.event.message.connection_time}\))?(?:%{SPACE}flight_time\(%{DATA:macos.event.message.flight_time}\))?(?:%{SPACE}rtt\(%{DATA:macos.event.message.rtt}\))?(?:%{SPACE}write_stalls\(%{DATA:macos.event.message.write_stalls:int}\))?(?:%{SPACE}read_stalls\(%{DATA:macos.event.message.read_stalls:int}\))?(?:%{SPACE}pake\(%{DATA:macos.event.message.pake}\))?\]' | ||
| - '^Task \<%{DATA:macos.event.message.task_uid}\>.\<%{NUMBER}\>%{SPACE}summary for %{DATA} \{(?:transaction_duration_ms=%{NUMBER:macos.event.message.transaction_duration_ms:int},)?(?:%{SPACE}response_status=%{NUMBER:macos.event.message.response_status:int},)?(?:%{SPACE}connection=%{NUMBER:macos.event.message.connection:int},)?(?:%{SPACE}protocol=%{DATA:macos.event.message.protocol},)?(?:%{SPACE}domain_lookup_duration_ms=%{NUMBER:macos.event.message.domain_lookup_duration_ms:int},)?(?:%{SPACE}connect_duration_ms=%{NUMBER:macos.event.message.connection_duration_ms:int},)?(?:%{SPACE}secure_connection_duration_ms=%{NUMBER:macos.event.message.secure_connection_duration_ms:int},)?(?:%{SPACE}private_relay=%{WORD:macos.event.message.private_relay:boolean},)?(?:%{SPACE}request_start_ms=%{NUMBER:macos.event.message.request_start_ms:int},)?(?:%{SPACE}request_duration_ms=%{NUMBER:macos.event.message.request_duration_ms:int},)?(?:%{SPACE}response_start_ms=%{NUMBER:macos.event.message.response_start_ms:int},)?(?:%{SPACE}response_duration_ms=%{NUMBER:macos.event.message.response_duration_ms:int},)?(?:%{SPACE}request_bytes=%{NUMBER:macos.event.message.request_bytes:long},)?(?:%{SPACE}response_bytes=%{NUMBER:macos.event.message.response_bytes:long},)?(?:%{SPACE}cache_hit=%{WORD:macos.event.message.cache_hit:boolean})?\}' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was like 1 in a few million but I did recently observe the connect_duration_ms value being larger then a valid int. Would it be worth updating the durations here to be long?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replacing int with long, this will handle the above scenario.
|
@muskan-agarwal26 you mentioned in one of the PR comments about this:
I am a bit concerned on that part because every security event should have both event.category and event.type. These two are usually mandatory with a few exceptions as both UI elements and many many features in security solutions expects these to be filled. |
|
Hi @P1llus , I have removed mapping in |
|
|
1 similar comment
|
|
|
Hi @cpascale43 , @narph , @nfritts , @leandrojmp , @P1llus , @marc-gr , @btrieger |
Proposed commit message
The initial release includes unified_log data stream and associated dashboard.
macOS fields are mapped to their corresponding ECS fields where possible.
Test samples were derived from live data samples.
Checklist
changelog.ymlfile.How to test this PR locally
To test the macOS package:
Related issues
Screenshots